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ABSTRACT 

One of the most appropriate quasiexper imental 
approaches to compensatory education is the regrsssion-discontinuity 
design. However, it remains underutilized, in part because of the 
need to clarify the link between the mathematical model and 
administrative decision-making. This paper explains the derivation of 
a program efficiency index congruent with the 

regression-discontinuity design. The efficiency index is based on a 
confidence interval calculated for expected mean posttest 
performance. If the observed posttest mean corresponds exactly to the 
upper limit of the confidence interval, the efficiency index is +1 
(optimal efficiency). If the observed posttest mean falls at the 
lower confidence boundary, the index value is zero (minimal 
efficiency). Data from a remedial mathematics program (grades 2, 3, 
7, and 8) illustrate the possibilities for: (1) net growth (index 
above 1); (2) breakdown (index below 0); and (3) gradations of 
maintenance (index between 0 and 1). As conceived, the efficiency 
index is comparable to eta-square, the correlation ratio. Thus, its 
analytic context differs from that of the effect size coefficient 
commonly associated with classical control group design. A model is 
presented showing how variations in the size of the efficiency index 
may lead to different decision making options. (Author/LPG) 
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Evaluation Technique and Program Efficiency Measures: 
Statistical Derivations for the Regression Discontinuity Design 

Abstract 

One of the most appropriate quasi-experimental approach to compensatory 
education is the regression-discontinuity design* However, it remains under- 
utilized, and some suggest that its utility to program evaluation could be 
enhanced if the link was nade more clearly between its mathematical rationale 
and the process of administrative decision-making (Linn, 1981)* This paper 
explains the derivation of a program efficiency index congruent with the 
regression discontinuity design. As conceived, the efficiency index is 
comparable to eta-square, the correlation ratio. Thus, its analytic context 
differs from that of the effect size coefficient (Cohen, 1969) commonly 
associated with the classical control group design. We will further show how 
variations in the size of the efficiency index may lead to different 
decision-making options. 



Evaluation Technique and Program Efficiency Measures : 
Statistical Derivations for the Regression Discontinuity Design 

» 

One of the most appropriate quasi-experimental approach to evaluate 
— compensatory education is the regression-discontinuity design. However, it 

| remains underutilized, and some suggest that its utility to program evaluation 

could be enhanced if the link was made more clearly between its mathematical 
rationale and the process of administrative decision-making (Linn, 1981). 

) This paper explains the derivation of a program efficiency index congruent 

with the regression discontinuity design. As conreived, the efficiency index 
is comparable to eta-square, the correlation ratio. Thus, its analytic 

| context differs from that of the effect size coefficient (Cohen, 1969) 

commonly associated with the classical control group design. We will further 
show how variations in the size o f the efficiency index may lead to different 

| decision-making options. 

Perspective 

E valuation Design 

) The regression-discontinuity (Carpbell and Stanley, 1966) is a quasi- 

experimental design appropriate for situations where there is a known inter- 
action between treatment assignment and ability (achievement, aptitude, etc.). 

I It has emerged in recent years as one of the most promising quantitative 

nodel s for the eval uation of compensatory educat ion. Based on the cri- 
terion of internal validity, the regression-discontinuity design has been 

I shown to be superior to the norm-referenced model (Linn, 1981), since there 

often are multiple academic and contextual differences between the remedial 
group under study and the national sa^rle from which test norms are developed. 

> Based on the criterion of feasibility, the regression-discontinuity design has 

been found preferable to the classical experimental /control group approach, 
since it is impractical or unethical, in many instances, to withhold 

' O . needed services from students in order to set up a comparison group (Wolf, 



1981). Beyond the issue of applicability, the design may be most desir- 
able, 1) when assignment to the 'treatment' group is based on a definite 
cutoff score, i. e., all students with a pretest score below a certain mark 
participate in the remedial program, while those above are dispensed of it; 2) 
* r en the educational environment includes multiple 'treatments,' and there is 
i ^eed to separate the impact of the remedial, supplementary intervention from 
that of the general program of instruction. To determine the treatment's 
effectiveness , the task of the evaluator is to estinate what the performance 
*evel of the low achieving group would be without the remedial support, then, 
: n e tests to see whether the actual score for that group is significantly 
c**~ f erent from the expected value. 

Two variants of this design exist. In the strict regression- 
:* scontinuity approach, separate pretest-posttest regression lines are ob- 
tained for the group above and the group below the cutoff point. Then two 
:*"edicted vclues for that pretest cutoff score are calculated, by fitting it 
""to each regression equation. A discontinuity in the regression lines, i.e., 
z cifference between the predicted cutoff values , if significant, is taken as 
i neasure of program impact. Tallmadge, Horst, and Wood (1975) propose a 
modification of the original technique that may be mere sensitive to a possi- 
r'e pretest/program interaction among the low achieving students. In this 
.ersion, known as regression-projection, the relationship between the pretest 
i r i the posttest is calculated only for the group of students above the cutoff 
s::re. Then, assuming linearity over the entire ra^ce cf pretest scores, a 
'' r :gle regression coefficient is used to estimate what the remedial group's 
: "Attest mean wou.d have been under a 'no-treatment' 1 condition. T he fnrnuli 
•:•* making such an estinate reads as: 



E (Y t ) - Y c + b c (X t -T c ) 

[Insert Figure 1 here] 

It simply neans that the difference between the high achieving and the 
"ow achieving group on the posttest is expected to be the sane as it was on 
the pretest, except for che imperfect correlation between the two measures. 
~ny discrepancy between the projected and the observed posttest mean is 
attributed to the remedial treatment. The two versions of the regression 
design are illustrated in Figure 1. The details of the statistical test to 
establish significance of the differences can be found in Sween (1971) for the 
r egression-discominuity , and in Tallmadge and Horst (1976) for the 
regress ion -project ion . 
Statistical Analysis 

The statistical tests offered to accompany the regression designs result 
the usual t-value. But, as has been pointed out by many authors (Cohen, 
1969; Hays, 1973), the emergence of a statistically significant value dees not 
~ruly reveal the strength of the relationship between the independent and the 
dependent variable. The information provided by the index of significance is 
oarticularly liri^ed when the hypothesis-testing paradigm is adopted. Hypothe- 
sis testing, however, is only one means of deriving statistical inference. As 
stated by Hays (1973), "in many circumstances," (and evaluation seems to be 
exactly one of these circumstances), "the primary purpose of data collectior 
; s not to test a hypothesis, but rather to obtain an estimate of some parame- 
ter" (p. 375 ). A range of values may be more useful or rore btable thar a 
single, unqualified estimate, given the presence of sarpl'.ng error at^ectinc 
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most research data. Rather than just ignoring the sampling error, an evalua- 
tor can place him/her self on safer ground by dealing straight forwardly with 
it, when drawing a conclusion about program effectiveness. To do that, one 
car turn to another form of statistical inference, the- calculation of a 
confidence i nterval . 

Ordinarily, in regression analysis, it is possible to establish confi- 
dence intervals for three different parameters: the regression coefficient 
itself, the actual store of an individual on the criterion measure, or the 
predicted value of a particular pretest score. Giv^n the critical role 
accorded to the predicted mean value in the regression design, the calculation 
o f the confidence interval is most necessary for that parameter, To obtain 
the boundaries of the confidence interval, i.e., the critical values for the 
expected mean, one can us2 the following formula adapted from Hays (1973): 



where: Y\ = Predicted posttest mean for the treatment group 
Xj. = 'lean of the treatment group on the pretest 

- Mean of the control group on the pretest 
est <Tyx = The standard error of estimate adjusted by the sample size 



For the t-value, any prooability may be retained by the evaluator, depending 
cn the desired le^el of confidence interval. 

If the actual posttest mean for the treatment group does not fall within 
the calculated interval, one can be 95 percent confident that 'something 
e ' traordi vary * is happening w*>th the prograr. If the observed mea r is aoove 
t h e upper limit of the confidence interval, the impact of the proarcn is 
definitely positive. On the other hand, if the observed mean is below the 



Y' t = (t rx/2 ) (est (TyA /\ + (Xt - Xc) 2 



V N fI5* 




lcwer limit of the confidence interval, the return on the program is clearly 
not what one would expect. As ore can see, the procedure is quite unequivocal 
about the extreme case c . One nay say that it also increases the likelihood of 
arriving at a nonsignificant difference. But even within the region of 
nonsignif icance, it is possible to set up a gradient of performance, which 
allows the evaluator to draw inferences not just about goal attainment, but 
also the lev^el at which a program operates. Indeed, all the bits of informa- 
tion obtained from the standard statistical analysis can be condensed into one 
measure that we call the efficiency index. The term efficiency speaks of the 
average amount cf progress made by the treatment group participants, relative 
to their o* ,i entry level and that of students in the control group. Mathema- 
tically, it is calculated accord^c to the following formula: 

e = (y - y'J - .5 

|y - /"! + |y - y } 

where: (y-y 1 ) " the difference oetween the observed (y) and the 

everted (y 1 ) posttest mean for the treatment group 

(y'-y") - "he difference between the expected mean (y') 
and its critical value (y")* 

The absolute value of (y'-y"} represents the distance from the lower 
limit of the confidence interval to its center, while the absolute value of 
(y-y 1 ) represents the distance from that center in either direction. There- 
fore, the first tern in the mathematical expression singly defines the "gain" 
at posttest time corrected fo^ uncertainty, i.e., th^ relative difference 
between the observed posttest score and the lower limit of the confidence 
interval; .5 is added simply to r urther facilitate interpretation. 
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Indeed, if the o u erved and the predicted posttest means coincide, the 
efficiency index will take the value of .5. If the observed posttest mean 
corresponds exactly to the upper limit of the confidence interval, the effi- 
ciency index will take the value of +1. If the observed posttest mean falls 
precisely at the lower boundary of the confidence interval, the efficiency 
irdex will take the value 0. 

Although the derivation of such an index may seem elaborate, its merit is 
that it tremendously simplifies the reporting of evaluation results to program 
administrators. That advantage can be appreciated when one has to deal with a 
program implemented at several grade levels. Whenever the efficiency index is 
greater than 1, the progran is probably exemplary; whenever the efficiency 
irdex is negative, the program is probably in trouble. Even when the index 
falls between 0 and 1, ( s r ether words, no statistical significance is ob- 
tained), it is still possible to call attention to different degrees of effi- 
ciency; in that sense, the procedure gets around the no-significant difference 
r^cblem, the lack of sensitivity, that Stufflebean et al. (1971) found as a 
r requent limitation of evaluation techniques. 

The whole procedure is illustrated below with actual data obtained at 
^our grade levels (2, 3, 7, and 8) for a remedial math program. 

In grade 7, for example, students with a pretest score lower than 38 NCEs 
(29th percentile rank) were assigned to the remedial program. The average 
pretest score for this low achieving group was 30.64 MCE, compared to a mean 
of 57.49 for students not participating in the program. Based on the re- 
gression analysis, it was projected that the posttest performance for students 
in the first group would be around 25.8 \CF, ir L he absence of the remedial 
program. 



Y' = 55.03 + . 77 17.0o \(30.64 - 57.49) = 25.78 

A 95 percent confidence interval was calculated, that extends ± 7.04 NCE 
points around that central value. 

25.78 i (2.001) (11.03)\ / + (30.64 - 57. 49) 2 = 25.78 ± 7.04 

\/59 59 x (12. 01) 2 

The observed posttest mean for the treatment group was 34.02, and fell outside 

of the confidence interval. It actually exceeded its upper limit by 1.20 NCE. 

That difference can be translated into an efficiency index equal: 

E = 34.02 - 25.78 +.5 = 1.C39 

1 7 . 04 | + 134.02 - 25.78| 

Clearly, the impact of the program is strc r ;iy positive at that grade level, 

for the average participating students. 

[Insert Table 1 here] 

The calculations for the other grade 'evels can be carried out in sinilar 

fashions . 

Siqnif ica r :e 

To understand the uti 1 i ty of the e f f i ciency i ndex , we can show its 
relationship to other measures of treatment effectiveness, and to the adminis- 
trative decision-making process. 
A - Treatment Effectiveness 

There exist several coefficients tc "":icate the impact of treatment on 
performance. They are mainly conceived ir :erns of the percentage of variance 
in the dependent variable accounted for t m < ~ n e treatment. In the framework of 
analysis of variance, when two conditions z, e involved and equal variances „re 
assumed, the most appropriate indicator c~ "^pact nay be omega-square 
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(Hays, 1973). in the framework of regression analysis, when a linear model 
ray not entirely fit the data, the best suited measure of strenqth of associa- 
tion is eta-square, also called the correlation ratio. From these basic 
coefficients, one can derive other descriptive statistics- that express the 
impact of a treatment in direct units of measurement rather than as a propor- 
tion. The effect size coefficient proposed by Co.ien (1969) is such an 
irdicator which clearly branches from omega-square. It expresses the 
difference between the means of a treatment and a control group in terms of 
the standard deviation (o f the control group). The efficiency index proposed 
in this paper is more directly related to eta -square, the correlation ratio. 
Let's recall that: 

r,\ x a JLj nj (Myj - W 

where 

the numerator stands for "the sum of squares between groups," and the 
denominator for the "total sun of squares." That denominator can be 
rewritten as: 

Ij Z^Yi-j - Myj) - (Myj - Myj 2 

The correlation ratio thus becores: 

n 2 y x l£ "j ("yj : fo) 2 , 

L Zfi [(YLj - Kyj) + (Myj - My)J 

Except for the summation signs and the power transformation, one can see 
that this mathematical expression is perfectly analogous to the ratio used in 
computing the efficiency index. 

The preceding discussion already points to the differences between the 
efficiency index (EI) and the e* f ect size (ES): 

a) Computationally, thei r mathematical roots are distinct, with the 
former being linked tc eta-square while the latter brancnes out 
from omega-square. 

b) In terms of magnitude, tr.e effect size coefficient expresses the 
distance between *>e n ear of a treatment group and that of a 
control group, while the efficiency index measures the distance 
from the observed mean to the lower limit of the expected mean. 
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c) More importantly, the two measures belong to different contexts 
of analysis. There are some serious questions regarding the 
application of the effect size coefficient in situations calling 
for the regression-discontinuity design. Indeed, that design is 
equivalent to a repeated measure experimental condition, :.i which 
each subject receives the two available treatments (regular and 
remedial irstructions) . The two scores being compared (the 
predicted value and the observed value for the treatment group), 
carnot be considered entirely independent. To that extent, some 
limitations are placed on the anova framework and its associated 
statistic*;. Tne effect size coefficient, as we have seen, falls 
in that category. All this is to say that while the effect size 
maintains its legitimacy in the regular experimental -control 
group design, the efficiency 'index seems preferable with the 
regression-discontinuity design. 

B - Management Information 

Two questions ^eed to be addressed now: 1) How does one convey that kind 
of complex information to administrators in a hardy way? 2) How dees one 
advance the probability that the reported information indeed be included in 
the decisicn-makirc process? 
1 - flaking it Accessible 

Information c r a program's e^iciency may be reported in a modi r ied 
scattergram ?s ^oT:^s. The horizontal axis shews the pretest scores (say in 
NCE's) with a clear nark for the cutoff point; the vertical axis shows dif r er- 
ent values of the efficiency index. One can divide the area delineated by 
these axes into three subfields, by drawing two lines at point 1 and 0, 
perpendicular to the efficiency axis. The top line, at point 1, corresponds 
of course to the u:oer limit of the confidence interval calculated; it can be 
referred to as the optimal efficiency 1 ine. The bottom line, at ooi n . U, 
corresponds to the lower limit of the confidence interval calculated; it may 
he referred tc as t*e minimal efficiency line. The cubfielc above the ^pt^mal 
efficiency line is designated a r et growth area; the suhfield betw»er the 
optimal and the rr.'vmal efneienev rii.es is designated as a maintenance a r ea; 
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the subfield below the minimal efficiency lino is designated as a breakdown 
area. The points in the scatterplot represent the various sites or grade 
levels at t'.nich the program was inpl emented. If at a particular grade level 
the actual posttest mean falls within the confidence interval, for the pre- 
£~cied mean, that observation will appear between the two efficiency lines; 
this will suggest that the remedial program is operating as a maintenance 
unit, whose utility is to prevent the deterioration of skills, and thus 
sustain the operation of the regular instructional program; in other words, 
without it, the regular program of instruction may not be able to function 
with any kind of efficacy. If at another grade level the p sttest mean 
exceeds the upper limit of the confidence interval, that observation will 
appear above the optimal efficiency line; this will suggest that the remedial 
orogran is operating as a production unit, capable of creating a net growth in 
students' competence. If at still another grade level the posttest mean fails 
to reach the lower limit of the confidence interval, that observation will 
aroear below the minimal efficiency line; this wi 1 1 suggest that the remedial 
program is in disrepair. The whole procedure for reporting information on 
program efficiency is depicted in Figure i. 
2 - Making it Practical 

In order to make the information he/she generates relevant to the decision- 
making process, the evaluator must have a good understanding of that process, 
"hat understanding should be based on empirical evidence about the overall 
prcgram environment, and should also be be guided by a theoretical 
r ramework. Previous research (Baybrooke and Lindbloop, 1963) suggests that 
:* r e process of rational decision-making foil w. f ou r rrinciples. hhat ?re 
:*~ese principles and what do they entail? 
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1. A decision requires a clear information base . 

The information base, which is of course nothing other than previous 
evaluation results, may indicate one of three things: a) a given program is 
capable of producing net academic growth, i. e., its efficiency index is 
greater than 1; b) a given program operates as a maintenance unit, i.e., its 
ef-'c.ency index is between 0 and 1; c) a given program is experiencing a 
breakdown, i.e., its efficiency index is lower than 0. 

2. A decision is always inscribed within a general approach to manage - 
ment . 

Following Stufflebeam et al . (1971), we distinguish three possible 
a^r^oaches in an educational setting: a) a homeostatic approach, intended to 
S'^s:ain the achieved balance in a program; b) an incremental approach, aimed 
at 'shifting the program to a new balance based upon small serial improve- 
ne^ts" (p. 69); c) a neonobil istic approach geared for a large and significant 
c^r^ge necessitated by critical program conditions. 

? . A decision calls for selection or design of specific procedures to b e 
fol lowed . 

This principle real ly speaks of the planning stage in the process, a) 
Plarning may consist in simply standardizing or cperationa 1 izing the proce- 
dves presently in use. b) Another possibility is to target particular areas 
w r e^e the need is the greatest, or where resource allocation will be most 
e'-'c^ent. c) Still another alternative is to reorganize a prcgran in all its 
asrects, adjusting the objectives, providing new means, redefining personnel 
r?*cs. setting check points for accountability. 
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4. A decision involves translating a set of selected procedures into 
activities in order to meet an objective . 

Three courses of action may be followed: a) one can continue or recycle 
a set of practices proven to be successful; b) one can offer training and 
other activities ir staff development; c) one can move to enforce or implement 
available guide 1 ires/procedures where numerous discrepancies have been found 
between a program's objectives and modus op^andi. 

Stufflebear. et al . insist that the ultimate objective of a rational 
decision-making process, similar to the one outlined above, is educational 
improvement. While no educator would contest that view, it has been our 
experience that a number of immediate goals often supersede the ultimate 
objective. These immediate administrative coals fall into three categories: 
those aimed at reducing change (transform-goals), those aimed at achieving 
control (confor^-goal s ) , these aimed at promoting or marketing a particular 
program or position for public relations purposes ( inform-goal s) . These 
immediate goals, because of the rather quick payoffs associated with them, are 
the guiding lights of management. So, the evaluation results must be articu- 
lated to them in order to sensitize the decision-makers. We propose a restruc- 
turing of the aecision-making model to reflect that situation. Figure 3 
depicts this new structure. 

The model establishes a correspondence between each immediate goal and 
the type of e'etents in the decision-making process which it seems most 
congruent with. It can be of great utility to the evaluator in formulating 
his/her recomre r ca lions for program development. Depending on the kind of 
evaluation resj'ts obtained (i. e., the value of the efficiency index), a 
particular adn r ~ strati ve approach, sore specific planning procedures, ard a 
set of corrective supportive activities may be suggested. That kind of 
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detailed, facilitative work has a good probability of catching the attention 
of the decision-makers. 
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Table 1 

Statistical Data for Chapter 1 and Nonchapter 1 Students in Mathematics 





Grade 


2 


3 


7 


8 


Parameters 




Treat. Cont. 


Treat. Cont . 


Treat. Cont. 


Treat. Cont. 



1. Pretest Mean 

2. SD of Pretest 

3. Posttest Mean 

4. SO for Posttest 
'j. Cutoff Score 

6. Pre-Post Correlation 

7. Sample Size (N) 

8. Expected Post Mean 

9. Confidence Interval for (8) 
10. Efficiency Index 



32.04 64.80 

11.26 14.89 
37.70 58.94 

17.27 19.39 
41.90 

.57 

70 65 
34.75 
±9.62 
+ .734 



23.27 60.00 
9.91 16.73 
32.98 59.13 
10.95 16.31 
28.20 

.39 

64 61 
44.70 
±10.20 
-.034 



30.64 57.49 
8.31 12.01 
34.02 55.03 
11.88 17.00 
38.00 

.77 

58 59 
25.78 
±7.04 
+ 1.04 



30.09 56.83 
9.36 14.46 

37.40 56.02 
8.15 14.58 

38.00 

.59 

66 60 
40.12 
±6.67 
+ .21 
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FIGURE 2 - PROGRAM EFFICIENCY AT FOUR GRADE LEVELS 
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FIGURE 3 - AN EVALUATION-BASED MODEL FOR DECISION MAKING 
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